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1. Introduction 

In nonparametric statistics, a lot of statisticians are interested with the esti- 
mation of a function from noisy observations. In this setting, people look for 
data-driven procedures able to perform very well, that is to say, for procedures 
very close to the target function. To reach this goal, a criterion is necessary to 
measure the performance of any procedure. One of the most usual way to mea- 
sure this performance is to evaluate its maximum risk over a functional space T 
which the unknown signal is supposed to belong. In the L 2 -case, the maximum 
risk of any procedure of estimation / on T is the quantity 

7M/,e) := supE||/ - 

where e > is the noise level. In the minimax setting, the main goal is to provide 
procedures which are as close as possible to the .F-minimax rate p T defined for 
any e > by 

^(e) :=inf7^(/,e) = inf. supE||/- /|||, 
/ / fer 



412 
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where the infimum is taken over all the data driven procedures. The minimax 
theory has been largely developed since the 1980-ies. A lot of minimax results 
have been obtained for Sobolev classes, Holder classes and Besov classes. 

Nevertheless, it appears that the minimax approach is not realistic since it 
requires the statistician to know the functional space T containing the unknown 
target function. Hence this point of view seems quite subjective and debatable. 
Moreover building an estimator adapted to the worst functions of T is not what 
applied statisticians are especially interested with. 

Keeping in mind these minimax drawbacks, Cohen, De Vore, Kcrkyacharian 
and Picard [6] have suggested an alternative approach to measure the perfor- 
mance of an estimation procedure: the maxisct point of view consists in ex- 
hibiting the largest subspace of L2 (maxiset) over which an estimator attains a 
given rate of convergence. To prove that a functional space A is the maxiset of 
a chosen procedure for a rate r = (r e ) e requires two steps. The first step is to 
prove that 

supr e - 1 E||/-/||2<oo =^/a 

The second step is to prove that 

sup supr^ 1 E||/ — /H2 < oo- 
feA e>0 

From now on, we denote by MS(f, (V e ) e ) the maxisct of the procedure / associ- 
ated with the rate of convergence r = (r e ) e • The two steps to establish a maxisct 
result can be rewritten as the following embedding properties: M S(f, (r e ) e ) C A 
corresponds to the first step and A C MS(f, (r e ) € ) to the second one. 

Although the maxisct approach is not extremely different from the minimax 
one, it is more optimistic since it provides a functional space directly connected 
to the estimation procedure. Thus this theoretical criterion to measure the per- 
formance of a chosen procedure appears to be more interesting for practical 
purposes. Indeed describing the maxiset of a procedure means knowing the en- 
tire functional space of well estimated functions. According to this point of view, 
the larger the maxiset, the better the procedure. Moreover it is interesting to 
remark that if a procedure /* is JF-minimax optimal then 

JFCMS(/*,0v(e)) e ). 

In the wavelet setting and using the maxiset approach, many results have ap- 
peared in nonparamctric statistics. Cohen, De Vore, Kcrkyacharian and Picard 
[6] and Rivoirard [22] have proved that linear procedures are outperformed by 
non linear ones in the density estimation model and the white noise model. In 
particular, they have identified the maxiscts of thresholding procedures with 
the intersection of Besov spaces and specific Lorentz spaces, called weak Besov 
spaces. More recently, Rivoirard [23] has shown that the maxisets of threshold- 
ing procedures coincide with those of classical Bayesian procedures associated 
with heavy tailed priors. Kerkyacharian and Picard [16] have proved that under 
some conditions, the maxiset of local bandwidth selection procedure is at least 
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as large as the one of the hard thresholding procedure, but they have let two 
open questions: what is exactly the maxiset of the local bandwidth selection 
procedure? Does the local bandwidth selection procedure outperform the hard 
thresholding procedure in the maxiset sense? 

The goal of this paper is to provide a new wavelet procedure which performs 
very well under both, the minimax and the maxiset approaches. In particular 
we aim at building a data-driven procedure which has better performances than 
the hard thresholding procedure. According to Autin [1] and [2] the only way 
to succeed in doing this is to consider procedures which arc not elitist, i.e. that 
allow to use some empirical wavelet coefficients smaller than a threshold for the 
reconstruction of the signal. Here we propose a wavelet procedure (hard tree 
rule) inspired from the local bandwidth selection procedure of Lepski [17]. 

Firstly, this new wavelet procedure depends on the choice of a maximal scale 
jmax to ensure the calculability of the estimate. According to this parameter, 
any empirical wavelet coefficient of the target function with a level index j larger 
than or equal to j max will not be considered for the reconstruction. As in Autin 
[1] and [2], the choice of this maximal scale will have a direct consequence on 
the shape of the maxiset. 

Secondly, the new procedure is based on thresholding methods associated 
with hereditary constraints (see Engel [13]). Using some ideas from tree approx- 
imation (see Cohen, Dahmen, Daubechies and Dc Vore [4], Engel [13]), from 
coding theory (see De Vore, Johnson, Pan and Sharplcy [9], Said and Pearlman 
[24], Shapiro [25]), and from Image Processing (see Wainwright, Martin et al. 
[2G] and Azimifar et al. [3]) we show that our new way of organizing the signal 
reconstruction allows to build a procedure with a very large maxiset. This new 
procedure outperforms the hard thresholding one as well as any elitist procedure 
in the maxiset sense. 

The paper is organized as follows. Section 2 is devoted to the description 
of the model and the definitions of the basic tools we shall need. In Section 
3, we describe the method to construct our wavelet procedure and we show 
the relationship with the local bandwidth selection procedure of Lepski. The 
minimax and maxiset performances of the procedure are studied in Sections 
4 and 5. The comparison between the performances of this procedure and the 
hard thresholding ones is discussed in Section 6. A short conclusion is given in 
Section 7 while the proofs of our results are given in the Appendix. 

2. Model and definitions 
2.1. Model 

We consider the white noise model: X e (.) is a random variable satisfying the 
following equation: 



X £ (dt) = f(t)dt + eW(dt), t G [0, 1[ 
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where 

• < e < | is the noise level, 

• / is a function defined on [0, 1[, 

• W(.) is the standard Brownian on [0, 1[. 

Let {ijjjk(-), j > — lj k G Z} be a wavelet basis of 1<2([0, 1[) with N vanishing 
moments (N e N') built by multi resolution analysis from a scaling function 
and a wavelet supported on [0, 2N — 1[. Any / G La([0, 1[) can be represented 
as: 

f = E E&* = E Etf>^*)^*- 

j>-ifc6Z j>-ifcez 

There exists a constant S^, such that at each level j > — 1 there are less than 
or equal to /C^; = 2 3 x non-zero wavelet coefficients. Hence, at each level j, 

the sum over fc in (2.1) can be replaced by the sum over k £ /Qf . 

In our setting, we can get all the observations: y^-fe = X e (ipjk) = (3jk + e-Z^fc 
where Zjfc are independent Gaussian variables jV(0, 1). 

All along this paper, for a real 77 > 1, we write 2 JA ~ A -2 '' to design the 
integer j\ such that 2~ JA < A 2 '' < 2 1 ~ JA . 

2.2. Definitions 

Definition 2.1. We say t/iat cm interval Ij k is dyadic if it corresponds to the 
support of the function ipjk and we denote by \Ijk\ = lip its length (where 
is the size of the support of the mother wavelet function). 

Definition 2.2. Let A > and Ijk be a dyadic interval such that < j < j\. 
We denote by (A) the binary tree containing the set of the dyadic intervals 
such that the following properties are satisfied: 

-/i*e7#>(A). 

- I fk > £ 7^ (A) =► I rw C I jk and \I fk ,\ > ^A 2 ". 

- Two distinct dyadic intervals ofT^{X) with same length have their inte- 
riors disjointed. 

- The numbers of dyadic intervals ofT^ (X) of length l^2~^ (j < j' < j\) 
is equal to 2 J ~ J 

- Any set of all dyadic intervals of Tj k n \X) with same length is forming a 
partition of Ij k . 

3. Construction of a new adaptive procedure 

The aim of this section is to provide a new wavelet procedure based on thresh- 
olding methods which takes advantage on the dyadic structure of the wavelet 
decomposition. 
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Let 

OC 

f®= E E M,k(t), te[o,i[ 

be a function to be estimated from the observations y^ of its wavelet coeffi- 
cients f3jk- We propose to estimate the function only using a finite number of 
observations of wavelet coefficients, that's why we consider the following family 
of Kcep-Or-Kill estimators: 

= {/(•) = E E 7ifcyifc^fc(.),7ifce{o,i}[. 

Any procedure in J r K (e) does not use the empirical wavelet coefficients yjk for 
which the level j is larger than or equal to jmax(e)- This condition ensures that 
any procedure of T K (e) is numerically calculable. As we shall see in Section 5, on 
the choice of the maximum scale j max will depend the maxiset of the procedure 
considered. 

In the sequel, we shall set A e = mey / log(e~ 1 ) where m is an absolute constant 
which will be chosen later and, for a fixed real number r\ > 1 (maximum scale 
parameter), we shall denote by j\ e the integer such that 2 3 *< ~ \~ 2r > and we 
shall put j max {e) = j\ e . 



3.1. Definition of the hard tree procedure 

Let us consider the following procedure, namely the hard tree procedure, defined 
for 77 > 1 by: 

J'x«-1 

/t(-)= E y-ife^-ife(-)+ E E lokVjk^jkQ (3.1) 

kg**- 1 ' 3=o ke!C ip 

with 

• 7jfc = 1 if there exists Ij>k> in (A e ) such that > A e , 

• 7jfc = otherwise. 

At first glance, this estimator is not very different from the hard thresholding 
one recalled in (4.2). It consists in keeping the empirical coefficients larger than 
A c and somehow, "in filling the holes" in each binary tree 7Jfe(A e ), as we can see 
in Figure 1. 

Notice that the hard tree estimator minimizes a penalized criterion. Indeed, 

JA £ -1 

f T = Arg , min ]T ]T ( ljk - l) 2 |% fc (A e )| 2 + 7 %X 2 
where |y jfe (A e )| := max{|y /fe ,|, g T^'{X e )}. 
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HARDTREE RULE 
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Fig 1. 



Moreover, this procedure is a tree rule (Engel [13]) since it satisfies the fol- 
lowing hereditary constraints: 

7i* = l => V^ fc ,// ifc eT.W(A e ), 7 y fe '=l, 

Tree-structures are often used in approximation theory and coding theory. For 
more details, we refer the reader to the papers of Cohen, Dahmcn, Daubcchics 
and De Vore [4], Cohen, Daubechies, Guleryuz and Orchard [5], De Vore, John- 
son, Pan and Sharpley [9], Said and Pearlman [24] and Shapiro [25]. 

3.2. Algorithm for the construction of hard tree rule 

In this paragraph, we give the method to construct the hard tree procedure, 
assuming that the noise level e is known. 

Algorithm 
Setup: 

• Choose the reals r\ > 1 and m > and put A e = me^\og(e^ 1 ); 

• Identify j Xc = min{j 6 N, V > X~ 2r '}- 

Construction steps: 

• Compute yjk with k € lei and level j < j\ r : 
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• Threshold any yj k at level A e and construct the set of indices 

J(A £ ) = {(j, k), j > 0, k G JCf, \y jk \ > AJ; 

• Construct the set of indices 

«5(A £ ) - |J {(j', k% with L yk , D I jk and I jk e 7$ (A e )}. 

(i,fe)ez(A«) 

iieiwrn: 

• The estimator / T = y-iki'-ik + ^ Vjk^jk- 

fce^'r 1 ' (i,fe)es(A e ) 

3.5. Connection with Lepski's rule 

In this paragraph, we show that the hard tree rule can be viewed as a wavelet- 
version of the bandwidth selection procedure of Lcpski [17] when the chosen 
wavelet basis is the Haar one. 

Notice that, in the Haar case, any dyadic interval is on the form Ij k = 
[w> [ wrtn j <= ^ an d k e {0, . . . , 2' J ' — 1}. Moreover, one gets a charac- 
terization of its wavelet components tj)j k {.). Indeed 

ip-i k (.) = ip-i(.-k) and ip jk {.) = 2*V(2 J --fc), with 

= l[o,i[(-) and = l[o,i[(0 - i[(-)- 

With this particular choice of wavelet basis, the hard tree procedure is defined 
by 

with 7jfe = 25 if there exists lyk' £ /j-jt such that |ij' fe , | > A^ and \yj' k >\ > A c ; 
7j7c = otherwise. 

Let us now briefly recall the definition of the local bandwidth selection rule 
(see Lepski [17] or Lcpski, Mammon and Spokoiny [18] for more details). 

Local bandwidth selection rule 

Let K be a compactly supported bounded kernel such that ||/^||l 2 = 1- F° r an Y 
j G N and any (t, u) € [0, 1[ 2 , let us denote 

K 3 (t, u) = 2 j K(2 j t, 2 j u) and Kj{t) = [ Kj{t, u)dX e (u). 

Jo 

Let us define the index j(t) as the minimum of admissible j's at the point t, 
where j < j\ t is admissible at the point t if j = j\ t or 

\K f+ i(t) - K,y{t)\ < 2^K Vj < f < j K - (3.2) 



F. Autin/ On the performances of a new thresholding procedure using tree structure 419 



The local bandwidth selection estimator f L is defined by: 

L(t) = k m (t). 

The definition of the hard tree rule is close to the definition of the local band- 
width selection procedure. Indeed, let us adapt the notion of admissibility from 
kernel estimators to wavelet estimators by considering the family of estimators 
(fj)j£N defined as follows: 

• fo(t) = y-ioV'-io^) 

2 3 -l 



fi+i(t) = fj(t) + J2y^ k ( t )- 



k=0 



If for any t £ [0, 1[ we denote by /■ the dyadic interval containing t such that 



= 2" J, then 



\f j+ i(t)-fj(t)\ = 



k=0 



= 2i\y it \. (3.3) 



Definition 3.1. We say that an integer j is (t,T)- admissible if: 
either j = j\ e or, for all j < j' < j\ c , for all t' £ Ij: 

\fy + i{t')-f r {t')\<2^K. 

Denote j T (t) = inf{j; j is (t,T)-admissible}. Still using (3.3) we can observe 
that: 

4, (t) (*)=£(*)■ ( 3 - 4 ) 

So, by adapting the notion of admissibility from kernel procedures to wavelet 
procedures, we have shown that the adaptive procedure (hard tree rule) and 
Lepski's rule are analogous when considering the particular choice of the ker- 
nel K: 

2 j -l 

Kj{x,y) = 2 J J2 4'-i(Vx - k)ijj-i(2 j y - k). 
4. Minimax result 

In this paragraph we aim at studying the performance associated with the hard 
tree rule in the minimax context. 

At first, let us recall the definition of Besov spaces B| ^ with < s < N. 

Definition 4.1. Let < s < N. We say that a function f £ L2([0, 1[) belongs 
to the Besov space oo if and only if: 

oo 

S ^/ JS T, E fa <°°- 

4' 
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Besov spaces constitute a large class of functional spaces. Recall that for any 
s > Sobolev space H s is included in S| «>• Moreover, if C denotes the strict 
inclusion between two functional spaces, 

B^CB^ for any 0< s' < s < JV. (4.1) 

Besov spaces are important in statistics since the maximal spaces of many clas- 
sical procedures like linear procedures (see Kerkyacharian and Picard [14] and 
Rivoirard [22]) and thresholding procedures (see Cohen, De Vore, Kerkyachar- 
ian and Picard [G] and Kerkyacharian and Picard [16]) are included in Besov 
spaces. 

We prove in the following theorem that the hard tree procedure is B\ oc - 
minimax optimal up to a logarithmic term which is known to be the price to 
pay for adaptation. 

Theorem 4.1. Let < s < N and r) > 1. Choose m > Ay/Srj. Then for any 

f e m.oo 

supA e - 4 */< 1+2s >E||/ T -/||2<oo. 

This result is just a consequence of Theorem 5.2 using the embedding proper- 
ties (5.1) and (5.2). This theorem shows that the hard tree procedure described 
in Section 3 performs very well. Moreover, let us recall the minimax result for 
the hard thresholding procedure: 

3'x e -i 

f H {.) = ]T y-i k Tp-ik(.) + J2 E W*I>W)- ( 4 - 2 ) 

Theorem 4.2. Let < s < N and n > 1. Choose m > A-\/2rj. Then for any 

BupA e - 4 »/( 1+2 ')E||/ ff -/||l<oo. 

This minimax result is a direct consequence of Theorem 5.1 of Section 5 using 
the embedding property (5.1). 

Remark 4.1. It is important to notice here that the minimax results given in 
Theorems 4-1 and 4-. 2 are valid for any choice of compactly supported wavelet 
basis provided that its number of vanishing moments N is strictly greater than s. 

Following the two last theorems 

Corollary 4.1. For any < s < N and any choice of r\~> 1, the hard tree proce- 
dure has the same performance as the hard thresholding procedure from the min- 
imax point of view when considering the same threshold level X e = me-^/log^ -1 ) 
with m > 4y / 3r/. Precisely, both procedures are £?f ^-minimax optimal (up to a 
logarithmic term). 
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A natural question arises here: could these procedures be discriminated when 
adopting the maxiset point of view? The answer is YES as we shall see. 

5. Maxiset result 

In this section, we aim at calculating the maxiset associated with the hard tree 
procedure so as to compare it with the one of the hard thresholding procedure 
when the rate of convergence is (Ae 2s ^)ej < s < N . 

At first we propose to recall the maxiset result given by Kerkyacharian and 
Picard [15] for the hard thresholding estimator. 

5.1. Maxiset of the hard thresholding procedure 

Let us introduce the following functional space. 

Definition 5.1. Let < r < 2. We say that a function f belongs to the weak 
Besov space W r if and only if: 

oo 

supA^ 2 ^ $kH\Pik\ <A} <oo. 

Weak Besov spaces compose a sub-family of Lorentz spaces (see Lorentz [19], 
[20] or De Vore and Lorentz [11]). There exists a natural relationship between 
Besov spaces and weak Besov spaces. The following embedding can be easily 
proved (see for instance Rivoirard [22]): 

#2.00 9 B 2^ 2s) n w ^ 2 for any < s < iV and any r) > 1. (5.1) 

Kerkyacharian and Picard [15] and [16] have pointed out the strong connection 
between these functional spaces and the hard thresholding procedure. 

Theorem 5.1 (Kerkyacharian-Picard). Let < s < N and ?; > 1. For any 

m > A^/2rj, we have the following equivalence: 

sup E\\f H - /HI < oo ^ / g B™ n W » , 

0<e<i 

that is to say, using the maxiset notation: 

MS(f H , (Aj^)e) = B™ n W^. 

5.2. Maxiset of the hard tree procedure 

In this paragraph, we exhibit the maxiset associated with the hard tree proce- 
dure associated with the rate (A^ ' 1+2 ^) e . 

Let us first define another functional space that will be useful in the charac- 
terization of the maximal space associated with hard tree procedure. 
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Definition 5.2. Let < r < 2 and r\ > 1. We say that a function f belongs to 

T 

the space W r „ if and only if: 



supA- 2 ^ 1 J2 ^ fc l{v/ J ^G^(A) J |^|<^}<oo. 



if 



In contrast of weak Besov spaces, note that the spaces W r _ (0 < r < 2, r\ > 1) 
are not invariant under permutations of wavelet coefficients within each scale. 
The following proposition shows that, for the same parameter r (0 < r < 2), 

T 

any functional space W r n contains the weak Besov space W r . Thanks to this 
result, a comparison between the maximal sets of hard tree rule and the hard 
thresholding rule will be possible, as we will see in Section 6. 

Proposition 5.1. For any < r < 2 and any r\ > 1, we have the following 
inclusion spaces: 

W r C W^. (5.2) 
Proposition 5.1 shows that for any parameters < r < 2 and n > 1, spaces 

T 

W r and W r ^ are different. 

Theorem 5.2. Let < s < N and rj > 1. For any m > 4:^/3ri, we have the 
following equivalence: 



sup A e 1+2s E||/ T - < oo / g B£™ n 

£/iai is io say, using the maxiset notation: 



MS(f T , (A e I+2s e = S 2 ,oo n 

' l+2»>" 

To prove this theorem we shall need the following proposition. 
Proposition 5.2. Fix < Ao < 1 and rj > 1. For any < r < 2 and any 



sup A r 

0<A<A o 



log (l)l ^ E 1 { 37 ^' G T /fc' ) ( A ) / l/WI > ^} < oo. (5.3) 



Remark 5.1. Lt is important to notice here that the maxiset results given in 
Theorems 5.1 and 5.2 are valid for any choice of compactly supported wavelet 
basis provided that its number of vanishing moments N is strictly greater than s. 

Following the two previous sections, let us comment the minimax and maxiset 
performances of the hard tree rule. 
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6. On the performances of the hard tree procedure 
6. 1 . Consequences of previous results 



Judging from Corollary 4.1 of Section 4, the hard tree procedure and the hard 
thresholding one are equivalent in the minimax sense. 

According to Proposition 5.1 and Theorem 5.2 we easily deduce that the hard 
tree procedure performs very well in the maxiset sense. Indeed, for a chosen 
rj > 1, its maxiset for the rate (Ae ' 1+2f ^) e corresponds to the intersection 

between the usual Besov space B^^ 2 ^ and another functional space W 2 

' l + 2s 

strictly larger than the classical weak Besov space W_2_ . Hence, 

Corollary 6.1. In the maxiset sense, the hard tree procedure is at least as good 
as the hard thresholding procedure since its maxiset for the rate (Ae i ^ 1+2s ') e 
contains the hard thresholding procedure one. 

It is important to notice that a strict inclusion between the maxisets of the 
hard tree rule and the hard thresholding rule can not be immediately deduced 
from previous results because of the intersections with the Besov space. At 
present, it is an open question whether the inclusion between maxisets is strict 
or not. Nevertheless wc give in the sequel results which address a slightly weaker 
problem. 



6.2. More results on spaces embeddings 



Proposition 6.1. For any < s < N and any rj > 1 the following spaces 
embedding holds: 

S^'n^ c B^^'*nw T 2 . 

l + 2s ^ z >°° i +2 s >V 

with 



°2,oo 



i=-i 



fee/eg 



sup 
J>-1 



~>2Ju \ " 



2 3 -l 



< 00 



According to Proposition 6.1, the strict inclusions of functional spaces are 

T 

still valid when intersecting W 2 and W 2 with the hybrid Besov space 

i+2s 1+2s ! r ; 

^2 + oo ■ From this result one immediately derives 

Corollary 6.2. Let < s < N and 77 > 1. The following spaces embedding 
holds for all u < s „ ; : 



B% x nw^ cB% oc nw T 2 . 

z,oo 1+2s ^ aoo T+27. 1 ) 
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Moreover, 

U< 77(l + 2s) U< 77(l + 2s) 

Also, the cmbeddings of spaces with strict inclusion are still valid when con- 
sidering intersection of spaces very close to the maxisets we have studied. Hence 
it is reasonable to claim that hard tree procedure is better than the hard thresh- 
olding procedure in the maxiset sense. 



6.3. On the choice of parameter rj 



In Sections 4 and 5 we gave minimax and maxiset results on the hard tree pro- 
cedure for any choice of parameter r) > 1. Precisely, the regularity parameter of 
the Besov space appearing in the maxisets of the hard tree and hard threshold- 
ing procedures depends on the choice of rj. Hence it could be interested to know 
if an optimal choice of r\ could be possible so as to build the hard tree rule with 
the largest maxiset. In fact, there is no doubt that the bigger the parameter r\ 
the larger the maxiset of the hard tree rule. Indeed 

Proposition 6.2. For any < s < N and any 1 < ?7i < V2, the following 
spaces embeddings hold: 

Nevertheless our results are asymptotic. In fact, if at first glance we opt for 
a choice of a very large n, we must be careful to the change for the worse of 
rate of convergence considered. Indeed, choosing a large 77 implies taking a large 
m. As a consequence the rate of convergence (Xi s '^ 1+2s ') e goes more slowly for 
such a choice. 



7. Conclusion 



The key point of this paper was to prove that a way to build very performing 
procedures is to combine thresholding methods and tree structure. Indeed the 
maxiset of the new wavelet procedure called hard tree procedure is proved to 
perform very well in the minimax and the maxiset settings. Although this pro- 
cedure looks like the hybrid version of Lepski's procedure proposed by Picard 
and Tribouley [21], namely hard stem rule, this one is different (see Autin [1]) 
and presents more advantages comparing to the hard stem rule. Firstly, Autin 
[1] has proved that the maxiset of the hard tree rule contains the one of the 
hard stem rule. Secondly the hard stem rule is a procedure which is only de- 
fined with the Haar wavelet basis. Indeed, the hard stem procedure is built at 
fixed t G [0, 1[ and therefore especially requires wavelet functions tpjk with dis- 
joint supports. Here the hard tree rule is defined for any compactly supported 
wavelet basis. 
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8. Appendix 

8. 1 . Proofs of Propositions 

Here and later, the constants C represent all the constants we shall need and 
can be different from one line to one other. 

Proof of Proposition 5.1. Let r\ > 1. The large inclusion is obvious when re- 
marking that for any sequence of wavelet coefficients (f3jk,j > 0?^)j for any 
< A < 1 and any < j < j\ 

The strict inclusion is a direct consequence of Proposition 6.1. □ 

Proof of Proposition 5.2. Let / G B^ r)/4v n W^ v and < A < A . Wc set for 
any w G N, ~ (2 1+u A)- 2? ?. We have 

^l{3/, fc ,E7;<"(A)/fe|>^ 

3=0 fc 

< C JA ^(2"- 1 A)- 2 

JA-1 

x J2 ^/3| fe l{2- 1 A < \0 jh \ < 2"A, \/I,, k , G T^(2 1+u X),\^, kl \ < 2"A} 

j=0 k 

< Clog (I) ^(2- 1 A)" 2 

3A,«-1 

X H ^^ fc l{2 1 ^ 1 A < |/3 jfc | < 2" A, V/^v G 3#>(2 1+ "A), \%, k ,\ < 2" A} 

CO 

+ Clo g (i)^(2"-A)- £ 

< C log(i)A-. 

The last inequality uses the fact that / G B^ocT n Wrn- ^ 

To prove some strict inclusions in Proposition 6.1 and Proposition 6.2, let us 
introduce the function h[m, a, ot\, 0:2] (■) of L2([0, 1[) for which the sequence of 
Haar wavelet coefficients (fljk)jk satisfies at each level j: 
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• if j is even, [_(mj + l)2 ja \ A 2 3 wavelet coefficients Pjk are equal to 2" Ql -', 
the others are equal to 0, 

• if j is odd, [(mj + 1)2 JQ: J A 2 J wavelet coefficients /3jk are equal to 2~ Q2 -', 
the others are equal to 0, 

and 

Pjk 7^ => max(/3j + i 2k,Pj+i 2fe+i) ^ 0. 
Proof of Proposition 6.1. Fix r\ > f and < s < N. Looking at Proposition 5.1 
with r = 2(1 + 2s) -1 , the large inclusion is obvious. For any < s < 1, the 
strict inclusion is given by considering the function h[m, a, a.\, c^KO with the 
parameters 

m = 1, a = (77(1 + 2s))" 1 , ax = 1, a 2 = (2?/)" 1 , 



which belongs to the space Sooo" 2 "' ^ W 2_ but does not belong to the space 

' l + 2s >V 

W 2 . □ 

l + 2s 

Proof of Proposition 6. 2. Let us first prove that for any 1 < 771 < 772 and any 
< s < N, 



Let f G B 9 " 1 i! +2s ' n W 2 and < A < 1. We set 2^ - A -2 '' 1 and 2^- 2 
A" 2 '' 2 . For any j > and any fc, we have 

j— n t. L 



i=o fc 
J'x.i-l 



* E E/^K- ^ ^) ( a), \ M < M + £ £0. 

i=o fc k J i=jA,i fc 

* E E^ 1 K,,g^(a), 1^1 <M+ £ £/?. 



< C(ATrfe+2 'ud+2»)) 

< CATfe. 



The last inequality uses the fact that / e R^oo n IF 2 . Hence / G W 2 . 

1 l + 2s l + 2s'' 72 

For any < s < 1, the strict inclusion is given by considering the function 
h[m, a, oti, 0:2] (•) with the parameters 

m = 0,a= (772(1 + 2s)) -1 , an = a 2 = ^Tft) -1 , 



which belongs to the space B2 2 ^ +2s) (~^W 2 but does not belong to the space 

' l + 2s I 1 ? 2 

/3|™. □ 
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8.2. Proof of Theorem 5.2 

It needs two steps. At first we have to prove 



STEP 1: MS(f T ,(\e +2s ) e ) C B 2 ^ +2s) n W* . (8.1) 



Then 



STEP 2: MS(f T , (A £ 1+2s ) e ) D B^ 2s) n W 2 (8.2) 

' l+2s 

must be proved. 

STJ5P 1: Let / G M5(/ r , (Af*^). We have, 

OO . „ . 

So, using the continuity of A e in 0, we deduce that 



sup 2^MT EE^I* <oo. 
i=j k 



It comes that / € $2 cx> • Let us now denote for any A > 

• \VjkW\ :=max{|^ fe ,|; I fk , e (X)}, 
. |& fe (A)| :=max{|/^ fe ,|; I yk , e T$\x)}, 
. \6 jk (X)\ := max{|y /fe , - ^, fc ,|; I fk , g 7#>(A)}. 
Remark 8.1. For any A > 0, 

< t} V T yk , G ^(A), < 

> ^ 3 I,-,*, G T/^(A), fe| > * 

Note that |yjfc(.)|, |/3jfc(-)l an d l^jfcOl are decreasing functions with respect 
to A. 

Choosing m 2 > I677, we have 

3=0 k 

E E^H^^'-t} 

E E$ fc l{fe(A e )| < ^} [l{|fo fc (A.)| < A E } + l{|y Jk (A«)| > A E }] 



j=0 k 

E 

3=0 fc 
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2A e -l 



< E E ( &* ~ ™-fc) 2 l{l&fcOe)| < A £ } 

2=0 k 

+ E ^ ^/3| fe P(fc fc (A E )| > A e )l{fe(A e )| < y} 



2=0 fc 
3X.-1 



2X--1 



2=0 fe 2=0 fe 

< n\f T -f\\l+Ch e 2^\ 2 e e^ 

< E||/ r -/||2+CA 2 

CAiP 5 . 



So, using the continuity of A e in 0, we deduce that 



sup A-t^ ^ 5>| fe l{v J,-,*, g 2#>(A), |/J,., fc ,| < A j < oo. 
A>0 j=0 fc ^ ZJ 



It comes that / G W_2 ■ So (8.1) is proved. 

l + 2s 



STEP 2: Let e m > be such that £mylog(^) < m 1 . It suffices to prove that 
for any < e < e m 



E\\f T -f\\ 2 2 = Ce 2 +E 



3=0 fc 



2 J=Ja £ fc 



The term ^ E^i fc can be bounded by CA e 1+2s , by using the definition of 

j=3x e k 



the Besov space B^ 23) . 

The term E 5^ "^^{ijkVjk — Pjk) 2 can be bounded by C + D, where 

j=0 fc 

C + £ = E ^ Y.$h HlVjkiXe)] < A e } 
j=0 fc 

jA t -l 

+ E E Efe -^) 2 !{l^( A e)| > AJ. 

j=0 fc 
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We split C into C\ + Ci as follows: 

JA e -l 

C i = E E l{l%fc(Ae)| < A £ }l{fe(A e )| < 2AJ 

j=0 k 

ft = Mfe^e)! < A e }l{|ftfc(A e )| > 2A £ }. 

J=0 fc 



Since / € 2 and / e i3 2 ^ 



>7(l + 2s) 
OO ) 



JA C -1 

C i = E E UlyifcWI < AJ1{|^(A £ )| < 2AJ 

3=0 k 

iA e -5-Llog 2 (»?)J OO 

< E £/^i{iM4A £ )i<2A £ } + E 

3=0 k J=JA £ -4-Llog 2 (i7)J fe 



< CA £ 1+2s 

and 



iA e -i 

C 2 = EEE^* < Ae}l{|^/c(A e )| > 2A £ } 

j=0 fc 

JAe-1 

]T $>? fc P(fo fc (A 6 )|>A e } 

3=0 k 



< 



< CA e *. 

We have used here the concentration property of the Gaussian distribution and 
the fact that m 2 > 4(1 + 77). 

We split D into D\ + D 2 as follows: 

ja.-i 



^1 = E E Efe-^ > AJl{lMA e )| < ^ j 

= E E^-^) 2 MlfefcCAe)! > A £ }l{|/3 jfe (A £ )| > ^j. 



i=o fe 

For D\ we use the Cauchy-Schwartz inequality: 

E(y jk - fr k ) 2 l{\6 jk (\c)\ > y} < 2^(p(fe fc > y) 1/2 (Efe -ft fc ) 4 ) 1/2 



F. Autin/On the performances of a new thresholding procedure using tree structure 430 



where E(y jk - /? jfe ) 4 = 3e 4 and V(\y jk ~ Pjk\ > -f) < t~ (using the con- 
centration properties of the Gaussian distribution). So, choosing m such that 
m 2 > 48ry, 



D t < C2^^ 1 ^ £ 2 l{|ft fe (A e )|<M^ 



J=0 fc 

c 2— Ar 16 



CA 



l + 2s 



For I?2, we use Proposition 5.2 with r = 1 ^ 2s and Ao = me m y log(^): 

■i=n t- 



3=0 fc 



]=0 k ^ J 
Looking at bounds of C\, C2, D\ and D2, (8.2) is proved. □ 
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